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HARDWARE PREDICATION FOR CONDITIONAL INSTRUCTION PATH BRANCHING 

BACKGROUND OF THE INVENTION 
FIELD OF THE INVENTION 

The present invention relates generally to the field of computer systems. More 
5 particularly, the present invention relates to the field of processing instructions having 
conditional program execution flow control. 
DESCRIPTION OF RELATED ART 

Typical processors execute instructions out of order to help improve instruction 
throughput. Such processors typically process instructions through a pipeline that fetches 
ffi 10 instructions fi-om memory, decodes each instruction, executes the instruction, and retires the 
M instruction. The operation of each stage of the pipeline typically overlaps those of the other 

ry 

W3 stages in time to help process instructions faster. 

fu 

By identifying instructions that may be executed regardless of whether one or more prior 

i, ' ; 

"Vi .... 

f-i fetched instructions are executed, typical processors may benefit fi"om executing instructions m 

p 15 parallel, that is overlapping the execution of two or more instructions in time, and/or fi-om 

13 

executing instructions out of order to avoid stalling on any one instruction, for example, while 
waiting for the completed execution of an instruction upon which the stalled instruction depends. 
Instructions executed out of order are retired by the pipeline in order. 

The pipeline generally fetches instructions of a program in a sequential order as defined 
20 by the program until the program alters its sequential flow with a jump or branch instruction, for 
example. 

An unconditional branch instruction, for example, identifies a non-sequential target 
instruction that is to follow the unconditional branch instruction. The pipeline identifies the 
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target instruction and then continues fetching instructions of the program starting with the target 
instruction. Before identifying the target instruction, however, the pipeline may have aheady 
fetched and started processing one or more instructions sequentially following the unconditional 
branch instruction as defined by the program. The alteration in the sequential flow of the 
5 program therefore penalizes the execution of the program as the pipeline is to flush such 
instruction(s) and restart fetching and processing at the target instruction. By identifying the 
target instruction early in the pipeline, such as in the fetch or decode stage for example, the 
pipeline helps avoid or minimize this penalty. 

A conditional branch instruction identifies a non-sequential target instruction that is to 

10 follow the conditional branch instruction if a condition identified by the conditional branch 
instruction is satisfied. If the condition is not satisfied, the fall through instruction, that is the 
instruction sequentially following the conditional branch instruction as defined by the program, 
is to follow the conditional branch instruction. Because resolution of the condition relies on the 
execution of one or more other instructions, the condition may not be resolved when the 

1 5 conditional branch instruction is fetched. As the pipeline then cannot determine which 

instruction is to follow the conditional branch instruction, the pipeline typically predicts whether 
the target instruction or the fall through instruction will follow at the risk of penalizing the 
execution of the program if the pipeline later determines the wrong instruction was selected. If, 
for example, the pipeline selects the target instruction and the condition is not satisfied, 

20 execution of the program is penalized as the pipeline flushes the target instruction and any 

fetched instructions following the target instruction when the conditional branch instruction is 
retired and restarts fetching and processing at the fall through instruction. 
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The pipeline may try to predict how the condition will be resolved, for example, based on 
prior executions of the same conditional branch instruction in the program. Typical pipelines, 
however, cannot accurately predict how every conditional branch instruction will be resolved 
every time and will therefore incur execution penalties due to branch mispredictions. 

Software predicated instructions, such as a conditional move instruction for example, 
may be used to eliminate or reduce branch instructions and therefore avoid or minimize 
execution penalties associated with branch mispredictions. Software predication, however, 
requires compiler help to substitute code in eliminating branch instructions and an instruction set 
architecture that provides for the software predicated instructions. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example and not limitation in the figures of 
the accompanying drawings, in which like references indicate similar elements and in which: 
Figure 1 illustrates an exemplary computer system comprising a processor having an 
5 instruction pipeline with hardware predication for conditional instruction path branching; 

Figure 2 illustrates, for one embodiment, a processor having an instruction pipeline with 
hardware predication for conditional instruction path branching; 

Figure 3 illustrates, for one embodiment, a flow diagram for processing instructions with 
Q hardware predication for conditional instruction path branching; 

m 10 Figure 4 illustrates, for one embodiment, a fetch/decode unit for the instruction pipeline 

1^ of Figure 3; 

fy 

to Figure 5 illustrates, for one embodiment, a dispatch/execute unit for the instruction 

lU 

pipeline of Figure 3 ; 

Figure 6 illustrates, for one embodiment, a flow diagram for dispatching and executing 

.'5 1™ 
IP 

Q 15 conditional micro-operations; 

13 

Figure 7 illustrates, for one embodiment, conditional execution circuitry with destination 
bypassing; 

Figure 8 illustrates, for another embodiment, a flow diagram for dispatching and 
executing conditional micro-operations; 
20 Figure 9 illustrates, for one embodiment, a flow diagram for dispatching and executing 

micro-operations dependent on conditional micro-operations; and 

Figure 10 illustrates, for one embodiment, dual execution circuitry with destination 
bypassing. 
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DETAILED DESCRIPTION 

The following detailed description sets forth an embodiment or embodiments in 
accordance with the present invention for hardware predication for conditional instruction path 
branching, hi the following description, details are set forth such as specific processor 
5 architecture, instruction processing techniques, etc., in order to provide a thorough understanding 
of the present invention. It will be evident, however, that the present invention may be practiced 
without these details. In other instances, well-known function blocks, interfaces, etc., have not 
been described in particular detail so as not to obscure the present invention. 

EXEMPLARY COMPUTER SYSTEM 

10 Figure 1 illustrates an exemplary computer system 100 comprising a processor 102 

having an instruction pipeline 200 with hardware predication for conditional instruction path 
branching in accordance with the present invention. Although described in the context of 
computer system 100, the present invention may be implemented in any suitable computer 
system comprising any suitable one or more integrated circuits. 

15 As illustrated in Figure 1, computer system 100 comprises another processor 104 that 

may also have an instruction pipeline with hardware predication for conditional instruction path 
branching, a processor bus 1 10, and a chipset 120. Processors 102 and 104 and chipset 120 are 
coupled to processor bus 110. Processors 102 and 104 may each comprise any suitable processor 
architecture and for one embodiment comprise an Intel® Architecture used, for example, in the 

20 Pentium® family of processors available from Intel® Corporation of Santa Clara, California. 
Computer system 100 for other embodiments may comprise one, three, or more processors any 
of which may comprise an instruction pipeline with hardware predication for conditional 
instruction path branching. 
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Chipset 120 for one embodiment comprises a memory controller hub (MCH) 130, an 
input/output (I/O) controller hub (ICH) 140, and a firmware hub (FWH) 170. MCH 130, ICH 
140, and FWH 170 may each comprise any suitable circuitry and for one embodiment are each 
formed as a separate integrated circuit chip. Chipset 120 for other embodiments may comprise 
any suitable one or more integrated circuit devices. 

MCH 130 may comprise any suitable interface controllers to provide for any suitable 
communication link to processor bus 110 and/or to any suitable device or component in 
communication with MCH 130. MCH 130 for one embodiment provides suitable arbitration, 
buffering, and coherency management for each interface, 
in 10 MCH 130 is coupled to processor bus 110 and provides an interface to processors 102 

1==^ and 104 over processor bus 110. Processor 102 and/or processor 104 may alternatively be 

fy 

combined with MCH 130 to form a single chip. MCH 130 for one embodiment also provides an 
interface to a main memory 132 and a graphics controller 134 each coupled to MCH 130. Main 
memory 132 stores data and/or instructions, for example, for computer system 100 and may 
p 15 comprise any suitable memory, such as a dynamic random access memory (DRAM) for 

example. Graphics controller 134 controls the display of information on a suitable display 136, 
such as a cathode ray tube (CRT) or Uquid crystal display (LCD) for example, coupled to 
graphics controller 134. MCH 130 for one embodiment interfaces with graphics controller 134 
through an accelerated graphics port (AGP). Graphics controller 134 for one embodiment may 
20 altematively be combined with MCH 130 to form a single chip. 

MCH 130 is also coupled to ICH 140 to provide access to ICH 140 through a hub 
interface. ICH 140 provides an interface to I/O devices or peripheral components for computer 
system 100. ICH 140 may comprise any suitable interface controllers to provide for any suitable 
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communication link to MCH 130 and/or to any suitable device or component in communication 
with ICH 140. ICH 140 for one embodiment provides suitable arbitration and buffering for each 
interface. 

For one embodiment, ICH 140 provides an interface to one or more suitable integrated 
drive electronics (IDE) drives 142, such as a hard disk drive (HDD) or compact disc read only 
memory (CD ROM) drive for example, to store data and/or instructions for example, one or 
more suitable universal serial bus (USB) devices through one or more USB ports 144, an audio 
coder/decoder (codec) 146, and a modem codec 148. ICH 140 for one embodiment also 
provides an interface through a super I/O controller 150 to a keyboard 151, a mouse 152, one or 
more suitable devices, such as a printer for example, through one or more parallel ports 153, one 
or more suitable devices through one or more serial ports 154, and a floppy disk drive 155. ICH 
140 for one embodiment further provides an interface to one or more suitable peripheral 
component interconnect (PCI) devices coupled to ICH 140 through one or more PCI slots 162 on 
a PCI bus and an interface to one or more suitable industry standard architecture (ISA) devices 
coupled to ICH 140 by the PCI bus through an ISA bridge 164. ISA bridge 164 interfaces with 
one or more ISA devices through one or more ISA slots 166 on an ISA bus. 

ICH 140 is also coupled to FWH 170 to provide an interface to FWH 170. FWH 170 
may comprise any suitable interface controller to provide for any suitable communication link to 
ICH 140. FWH 170 for one embodiment may share at least a portion of the interface between 
ICH 140 and super I/O controller 150. FWH 170 comprises a basic input/output system (BIOS) 
memory 172 to store suitable system and/or video BIOS software. BIOS memory 172 may 
comprise any suitable non-volatile memory, such as a flash memory for example. 
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INSTRUCTION PIPELINE WITH HARDWARE PREDICATION 
Processor 102 comprises instruction pipeline 200 with hardware predication for 
conditional instruction path branching to help avoid or minimize any program execution penalty 
due to branch mispredictions. 

As illustrated in Figure 2, processor 102 for one embodiment comprises instruction 
pipeline 200 with hardware predication, instruction cache 210, data cache 212, secondary cache 
214, bus interface unit 216, and processor architecture registers 218. Bus interface unit 216 
couples system bus 110, instruction cache 210, data cache 212, and secondary cache 214 to one 
another. Instruction cache 210, data cache 212, and registers 218 are each coupled to instruction 
pipeline 200. 

Instruction cache 210, data cache 212, and secondary cache 214 form a two cache level 
memory subsystem to help ensure a steady supply of instructions and data to instruction pipeline 
200. Instruction cache 210 and data cache 212 are at a primary cache level and may be accessed 
relatively quickly as instruction cache 210 and data cache 212 are each relatively small in size 
and closely coupled to instruction pipeline 200. Secondary cache 214 is at a secondary cache 
level and stores more instructions and data for instruction pipeline 200 relative to instruction 
cache 210 and data cache 212 yet has a slower access time relative to instruction cache 210 and 
data cache 212. 

Instruction cache 210 and/or secondary cache 214 store instructions accessed from main 
memory 132 through bus interface unit 216 for processing by instruction pipeline 200. 
Instruction cache 210 and/or secondary cache 214 may also store recently and/or frequently used 
instructions. Data cache 212 and secondary cache 214 store data accessed from main memory 
132 through bus interface unit 216 for processing by instruction pipeline 200. Data cache 212 
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and/or secondary cache 214 may also store recently and/or frequently used data. Instruction 
cache 210, data cache 212, and secondary cache 214 may store instructions and/or data in 
accordance with any suitable caching scheme. Although described as comprising instruction 
cache 210, data cache 212, and secondary cache 214, processor 102 may comprise any other 
suitable memory subsystem for storing instructions and data for instruction pipeline 200. 

Instruction pipeline 200 for one embodiment comprises a fetch/decode unit 202, a reorder 
buffer 204, a dispatch/execute unit 206, and a retire unit 208. Fetch/decode unit 202 is coupled 
to instruction cache 210. Reorder buffer 204 is coupled to fetch/decode unit 202, 
dispatch/execute unit 206, and retire unit 208. Dispatch/execute imit 206 is coupled to 
fetch/decode unit 202 and data cache 212. Retire unit 208 is coupled to data cache 212 and 
registers 218. 

Instruction pipeline 200 for one embodiment processes instructions of a program in 
accordance with a flow diagram 300 as illustrated in Figure 3. Instruction pipeline 200 may 
process any suitable instruction at any suitable level, such as macro-instructions for example. 
The program for one embodiment defines a sequential order for the instructions of the program 
and comprises one or more conditional branch instructions. As used in this detailed description, 
a conditional branch instruction encompasses any instruction defined to alter the flow of 
execution of instructions of a program based on whether one or more conditions have been 
satisfied. Each conditional branch instruction for one embodiment identifies a condition and a 
target instruction that is to follow the conditional branch instruction if the condition is satisfied. 
The conditional branch instruction may identify any suitable condition and target instruction in 
any suitable manner. Conditional branch instructions are also known as conditional jump 
instructions, for example. 
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For block 302 of Figure 3, fetch/decode unit 202 fetches a next instruction of a program 
from instruction cache 210. Fetch/decode unit 202 may fetch instructions from the program in 
any suitable manner. 

Fetch/decode unit 202 for block 304 identifies whether the fetched instruction is a 
conditional branch instruction. If so, fetch/decode unit 202 for block 306 identifies whether the 
fetched instruction is a quahfying conditional branch instruction and, if so, for block 308 predicts 
fall through execution for the identified qualifying conditional branch instruction. The next 
instruction fetched by fetch/decode unit 202 will therefore be the instruction sequentially 
following the qualifying conditional branch instruction as defined by the program. Fetch/decode 



|S 10 unit 202 may identify conditional branch instructions in any suitable manner and may define and 



|# identify qualifying conditional branch instructions in any suitable manner. 

ry 

Ifl Fetch/decode unit 202 for one embodiment for block 306 identifies whether the fetched 

\L instruction is a conditional forward branch instruction, that is whether a target instruction 

identified by the conditional branch instruction is positioned after the conditional branch 

in 

Q 1 5 instruction in the sequential order of instructions as defined by the program. Fetch/decode unit 
202 for one embodiment for block 306 identifies whether the fetched instruction is a conditional 
branch instruction identifying a target instruction within a suitable predetermined number of 
instructions from the conditional branch instruction. Fetch/decode unit 202 for one embodiment 
for block 306 identifies how predictable the identified conditional branch instruction is, for 
20 example, by determining how often the condition is resolved in the same manner each time the 
conditional branch instruction is executed. If fetch/decode unit 202 for block 306 identifies a 
conditional branch instruction as a conditional forward branch instruction, as identifying a target 
instruction within a suitable predetermined number of instructions from the conditional branch 
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instruction, and/or as not being predictable within a suitable predetermined degree of accuracy, 
fetch/decode unit 202 for block 308 predicts fall through execution for the identified conditional 
branch instruction. 

If the fetched instruction is identified as a conditional branch instruction for block 304 yet 
is not a qualifying conditional branch instruction as determined for block 306, fetch/decode unit 
202 for block 310 predicts either the identified target instruction or the fall through instruction 
will follow at the risk of penalizing the execution of the program if the wrong instruction was 
selected. Fetch/decode unit 202 may perform such branch predictions for block 3 10 in any 
suitable manner. 

Fetch/decode unit 202 for block 312 decodes the fetched instruction into one or more 
micro-operations. Fetch/decode unit 202 may decode instructions into any suitable one or more 
micro-operations in any suitable manner. Although described in the context of micro-operations, 
fetch/decode unit 202 for other embodiments may decode the fetched instruction into any 
suitable one or more instructions at any suitable one or more instruction levels. 

Fetch/decode unit 202 for block 314 determines whether the fetched instruction is in a 
fall through branch instruction path or any target branch instruction path for an identified 
qualifying conditional branch instruction. A fall through branch instruction path for a 
conditional branch instruction comprises one or more instructions that are executed only if a 
condition for the conditional branch instruction is not satisfied. A target branch instruction path 
for a conditional branch instruction comprises one or more instructions that are executed only if 
a condition for the conditional branch instruction is satisfied. Because the target instruction for a 
conditional branch instruction may be executed regardless of how a condition for the conditional 
branch instruction is resolved, each conditional branch instruction may not have a target branch 
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instruction path. Fetch/decode unit 202 may identify instructions in a fall through branch 
instruction path and in any target branch instruction path in any suitable manner. 

If the fetched instruction is in the fall through branch instruction path or any target branch 
instruction path for an identified qualifying conditional branch instruction, fetch/decode unit 202 
for block 316 associates a condition for the qualifying conditional branch instruction with each 
micro-operation for the fetched instruction. Fetch/decode unit 202 may associate the condition 
with each micro-operation for the fetched instruction in any suitable manner. Li decoding a 
fetched instruction into one or more micro-operations and associating a condition with each such 
micro-operation, fetch/decode unit 202 effectively decodes the fetched instruction into one or 
more conditional micro-operations. 

As an illustration as to how a condition is associated with one or more fetched 

instructions, an exemplary program fragment contains the following instructions: 

JC (Targetl) 
ADDS1,S2 
DEC SI 
Targetl: SUB S1,S2 

where JC (Targetl) designates to jump or branch to the instruction at Targetl if condition C is 

satisfied or to continue with the next sequential instruction if condition C is not satisfied, ADD 

S1,S2 designates to add the content of logical register SI to that of logical register S2 and store 

the sum in logical register SI, DEC SI designates to decrement the content of logical register SI, 

and SUB S1,S2 designates to subtract the content of logical register S2 fi-om that of logical 

register SI and store the difference in logical register SL 

When fetch/decode unit 202 fetches the conditional branch instruction JC (Targetl), 

fetch/decode imit 202 for this illustration identifies JC (Targetl) as a qualifying conditional 
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branch instruction, for example, because JC (Target 1) is a forward conditional branch 
instruction, identifies the target instruction SUB S1,S2 within five instructions of JC (Target 1), 
and is not predictable within a predetermined degree of accuracy. Fetch/decode unit 202 predicts 
fall through execution for JC (Target 1) and therefore continues fetching instructions in sequential 
order as defined by the program. As fetch/decode unit 202 fetches and decodes the instructions 
in the fall through branch instruction path for JC (Target 1), fetch/decode unit 202 associates the 
condition C with the one or more micro-operations for each such instruction, effectively 
associating the condition C with each such instruction as follows. 



Target 1: SUB S1,S2 
The condition C is illustrated in inverse form, that is as C*, because the instructions in the fall 
through branch instruction path are to be executed only if the condition C is not satisfied. 

Because the target instruction SUB S1,S2 is to be executed regardless of how the 
condition C is resolved, the conditional branch instruction JC (Target 1) does not have a target 
branch instruction path in this illustration. 

As another illustration as to how a condition is associated with one or more fetched 
instructions, an exemplary program fragment contains the following instructions: 



JC (Targetl) 
ADDS1,S2/C 
DEC SI /C 



JC (Targetl) 
ADDS1,S2 
DEC SI 



Targetl: 
Target2: 



JMP Target2 
SUB S1,S2 
MUL S4,S5 
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where JMP Target2 designates to unconditionally jump to the instruction at Target2 and MUL 
S4,S5 designates to multiply the content of logical register S4 by that of logical register S5 and 
store the product in logical register S4. 

As fetch/decode unit 202 fetches and decodes the instructions in the fall through branch 
5 instruction path for JC (Target 1) and the instruction in the target branch instruction path for JC 
(Target 1), fetch/decode unit 202 associates the condition C with the one or more micro- 
operations for each such instruction, effectively associating the condition C with each such 
instruction as follows. 



1 5 The instruction in the target branch instruction path, that is SUB S 1,S2, is to be executed only if 

the condition C is satisfied. 

Fetch/decode unit 202 for block 318 maps any sources and renames any destinations for 

each micro-operation for the fetched instruction. Fetch/decode unit 202 may perform mapping 

and renaming in any suitable manner. 
20 Fetch/decode unit 202 for block 320 allocates each micro-operation for the fetched 

instruction in reorder buffer 204. Fetch/decode unit 202 may allocate each micro-operation in 

reorder buffer 204 in any suitable manner. 

Fetch/decode unit 202 may comprise any suitable circuitry. As illustrated in Figure 4, 

fetch/decode unit 202 for one embodiment comprises an instruction pointer 402, a branch 



10 



Targetl: 
Target2: 



JC (Targetl) 
ADDS1,S2/C 
DEC SI /C 
JMP Target2 / C 
SUBS1,S2/C 
MUL S4,S5 
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prediction unit 404, a decoder 406, a conditional branch processing unit 408, and a register alias 
table (RAT) and allocate unit 410. 

Fetch/decode unit 202 controls instruction pointer 402 to identify for block 302 the next 
instruction to be fetched from instruction cache 210 based on inputs, for example, from branch 
5 prediction unit 404, exception/interrupt status, and/or branch misprediction indications from 
dispatch/execute unit 206. 

Branch prediction unit 404 receives fetched instructions from instruction cache 210, 
identifies each fetched conditional branch instruction for block 304, and predicts for block 310 
either a target instruction or the fall through instruction for the conditional branch instruction is 
Iji 10 to be fetched next. Branch prediction unit 404 may perform branch predictions in any suitable 
1^ manner. Branch prediction unit 404 for one embodiment identifies qualifying conditional branch 

ffl instructions for block 306 and predicts fall through execution for block 308. Branch prediction 

ly 

^ unit 404 is coupled to instruction cache 210, instruction pointer 402, and dispatch/execute imit 
206 and may comprise any suitable circuitry. 

tn 

?k 15 Decoder 406 is coupled to instruction cache 210 and receives and decodes each fetched 

13 

instruction into one or more micro-operations for block 312. Decoder 406 may comprise any 
suitable circuitry to decode each fetched instruction into any suitable one or more micro- 
operations in any suitable manner. Decoder 406 for one embodiment decodes each instruction 
into one or more triadic micro-operations. A triadic micro-operation comprises an operation 
20 code or opcode and may comprise up to two logical source operands and one logical destination 
operand. 

Decoder 406 for one embodiment tags a micro-operation for each qualifying conditional 
branch instruction to identify the qualifying conditional branch to conditional branch processing 
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unit 408 and dispatch/execute unit 206. Each qualifying conditional branch may be identified in 
any suitable manner by branch prediction unit 404 and/or decoder 406. Decoder 406 may also 
decode each qualifying conditional branch instruction in a suitable manner so as to distinguish 
micro-operations for qualifying conditional branch instructions from other conditional branch 
5 instructions. 

Decoder 406 for one embodiment tags a micro-operation for each conditional branch 
instruction that is not a quahfying conditional branch instruction with suitable information 
identifying the fall through branch instruction path and the predicted instruction path for the 
conditional branch instruction to help dispatch/execute unit 206 identify branch mispredictions. 

|S 10 The fall through branch instruction path and the predicted instruction path may be identified in 

1^ any suitable manner by branch prediction umt 404 and/or decoder 406. 

|fl Conditional branch processing unit 408 receives micro-operations from decoder 406, 

iy 

li^ identifies micro-operations in the fall through branch instruction path and in any target branch 

^ instruction path for a quahfying conditional branch instruction for block 314, and associates a 

S 

p 15 condition for the qualifying conditional branch instruction with each such identified micro- 
operation for block 316. Conditional branch processing unit 408 for one embodiment identifies 
qualifying conditional branch instructions based on micro-operations received from decoder 406. 
Conditional branch processing imit 408 is coupled to decoder 406 and may comprise any suitable 
circuitry to identify micro-operations in the fall through branch instruction path and in any target 
20 branch instruction path for a qualifying conditional branch instruction and to associate a 
condition for the qualifying conditional branch instruction with each such identified micro- 
operation in any suitable marmer. Conditional branch processing unit 408 for one embodiment 
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tags each such identified micro-operation with a conditional flag identifying the condition as an 
additional source operand for the micro-operation. 

RAT and allocate unit 410 receives micro-operations from conditional branch processing 
unit 408 and maps any sources and renames any destinations for each micro-operation for block 
5 318. RAT and allocate unit 41 0 for one embodiment for block 318 converts logical register 
references to physical register references and in so doing forms dependency links between 
physical destinations and sources using a rename map. For one embodiment where conditional 
branch processing unit 408 tags a micro-operation with a conditional flag identifying a condition 
for a qualifying conditional branch instruction, RAT and allocate unit 410 attaches to the tagged 

Iji 10 micro-operation an identifier of the same physical flag register upon which the qualifying 

I- ^ ■ 

1^ conditional branch instruction depends. 

fU 

IS As an example, fetch/decode unit 202 may decode, map, and rename the macro- 

^l^ instruction ADD Ldest,Lsource from a fall through branch instruction path for a qualifying 

Ij 

conditional branch instruction into the micro-operation ADD Pdest4 < (Pdestl , Pdest2), 
15 Pdest3:flag, where Ldest is a logical destination register, Lsource is a logical source register, 

Q 

ADD Ldest,Lsource designates to add the content of logical register Ldest to that of logical 
register Lsource and store the sum in logical register Ldest, Pdest4 is a physical destination 
register to store the result of the ADD instruction, Pdestl is a physical destination register 
corresponding to logical register Lsource, Pdest2 is a physical destination register corresponding 
20 to logical register Ldest, and Pdest3 is a physical destination register corresponding to a flag 
register to store a conditional flag upon which the qualifying conditional branch instruction 
depends. 
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RAT and allocate unit 410 also allocates each micro-operation in reorder buffer 204 for 
block 320. In entering micro-operations in reorder buffer 204, RAT and allocate unit 410 for one 
embodiment for block 320 adds status information to the micro-operations to prepare them for 
out-of-order execution. 

RAT and allocate imit 410 is coupled to conditional branch processing unit 408 and 
reorder buffer 204 and may comprise any suitable circuitry to perform mapping, renaming, and 
allocation in any suitable manner. 

Reorder buffer 204 receives and stores each micro-operation from fetch/decode unit 202. 
Reorder buffer 204 also stores micro-operations that have already been executed by 
dispatch/execute unit 206 but not yet retired. Reorder buffer 204 may comprise any suitable 
circuitry and for one embodiment comprises an array of content-addressable memory (CAM). 

Dispatch/execute unit 206 for block 322 of Figure 3 dispatches micro-operations stored in 
reorder buffer 204 for execution and executes dispatched micro-operations. Dispatch/execute 
unit 206 schedules and executes micro-operations stored in reorder buffer 204 in accordance 
with data dependencies among such micro-operations and execution resource availability and 
therefore supports out-of-order execution of micro-operations. Dispatch/execute unit 206 stores 
any result of executing a micro-operation with that micro-operation in reorder .buffer 204. 

Dispatch/execute unit 206 may comprise any suitable circuitry. As illustrated in Figure 
5, dispatch/execute unit 206 for one embodiment comprises a reservation station 502, integer 
execution units 511 and 512, floating point execution units 513 and 514, and a memory interface 
execution unit 515. Each execution unit 511-515 is coupled to reservation station 502. Although 
illustrated as comprising five execution units 511-515, dispatch/execute unit 206 for other 
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embodiments may comprise any suitable number of execution units each of which may execute 
any suitable type of micro-operation. 

Reservation station 502 is coupled to reorder buffer 204 and scans the status of micro- 
operations in reorder buffer 204 to identify micro-operations that are ready to be executed, such 
5 as micro-operations having available source operands for example. Reservation station 502 for 
block 322 dispatches each ready micro-operation to an appropriate execution unit 51 1, 512, 513, 
5 14, or 5 1 5 available to execute the micro-operation. Each execution imit 511-515 returns any 
result of executing a micro-operation to reorder buffer 204, either directly or through reservation 
station 502, to store the result v^ith the corresponding micro-operation in reorder buffer 204. 

10 Retire unit 208 retires executed micro-operations for block 324 of Figure 3. Retire unit 

208 for one embodiment scans the status of micro-operations in reorder buffer 204, removes 
executed micro-operations from reorder buffer 204, and commits the results of executed micro- 
operations to machine state, that is to registers 218 and to main memory 132 through data cache 
212. Retire unit 208 for one embodiment retires executed micro-operations in accordance with 

15 the order in which their respective instructions were fetched and decoded by fetch/decode xmit 
202. Retire imit 208 may comprise any suitable circuitry to retire executed micro-operations in 
any suitable maimer. 

Although described in the context of instruction pipeline 200 as illustrated in Figures 2, 4, 
and 5 and in the context of flow diagram 300 of Figure 3, the present invention may be 
20 implemented using any other suitable pipeline architecture that processes instructions in 
accordance with any suitable flow diagram. 

For another embodiment, for example, fetch/decode unit 202 may allocate micro- 
operations in reservation station 502, for example, of dispatch/execute unit 206 rather than in 
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reorder buffer 204 for block 320 of Figure 3. Upon dispatching and executing allocated micro- 
operations, dispatch/execute unit 206 may then store any results in reorder buffer 204 for 
retirement by retire unit 208. 

Instruction pipeline 200 for another embodiment may not decode any instructions fetched 
5 for block 302 of Figure 3 into micro-operations but rather may process each instruction directly 
for blocks 314-324. 

CONDITIONAL BRANCH INSTRUCTION EXECUTION 
In executing and retiring conditional branch instructions that are not qualifying 
conditional branch instructions as determined for block 306 of Figure 3, dispatch/execute unit 
ifi 10 206 for one embodiment detects any branch mispredictions, flushes from reorder buffer 204 any 
L micro-operation(s) for the instruction(s) in the mispredicted instruction path, and signals 

s 

& fetch/decode unit 202 to restart fetching instructions along the appropriate instruction path. For 
II, one embodiment, fetch/decode unit 202 tags a micro-operation for each such conditional branch 

instruction with information identifying the fall through branch instruction path and the predicted 
Q 15 instruction path for the conditional branch instruction. Upon execution of the conditional branch 

instruction, dispatch/execute imit 206 may then determine whether the prediction was correct. 

For one embodiment, as illustrated in Figure 5, integer execution unit 511 handles executing 

micro-operations for conditional branch instructions in this manner. 

For each qualifying conditional branch instruction as determined for block 306 of Figure 
20 3, any program execution penalty due to a branch misprediction is avoided or minimized as 

instruction pipeline 200 continues to fetch instructions sequentially following the qualifying 

conditional branch instruction and processes each instruction in the fall through branch 

instruction path and in any target branch instruction path for the qualifying conditional branch 
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instruction as a conditional instruction regardless of whether the fall through instruction or the 
target instruction is to follow the qualifying conditional branch instruction. In this manner, any 
flushing of instructions from instruction pipeline 200 and any refetching of instructions due to a 
branch misprediction is avoided. 
5 Unlike using software predication, instruction pipeline 200 may execute instructions of 

existing instruction set architectures from existing programs without additional compiler help to 
substitute code in eliminating branch instructions. 

Dispatch/execute unit 206 for one embodiment identifies a micro-operation for a 
qualifying conditional branch instruction based on how fetch/decode unit 202 decoded and/or 

10 tagged the conditional branch instruction. As dispatch/execute unit 206 is to execute each 

conditional micro-operation in the fall through branch instruction path and in any target branch 
instruction path for the qualifying conditional branch instruction, dispatch/execute unit 206 does 
not signal fetch/decode unit 202 to restart instruction fetching despite a branch misprediction. 

Dispatch/execute unit 206 executes each conditional micro-operation by performing both 

15 the operation designated by the conditional micro-operation in accordance with the instruction 
from which the conditional micro-operation was decoded to produce a first result and an 
execution bypass operation to produce a second result the retiring of which preserves the 
machine state as if the conditional micro-operation was not executed. The first result or the 
second result for each conditional micro-operation is committed to machine state based on how 

20 the condition associated with the conditional micro-operation was resolved. 

Dispatch/execute unit 206 may perform any suitable execution bypass operation. For one 
embodiment where each of one or more instructions use the same logical operand as both a 
source and a destination for the instruction and fetch/decode unit 202 maps the logical source to 
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one physical location and the destination to another physical location, dispatch/execute unit 206 
performs an execution bypass operation by moving or copying the data at the source physical 
location to the destination physical location. 

As one example, the instruction ADD S1,S2 is in the fall through branch instruction path 
5 of a qualifying conditional branch instruction and designates to add the content of logical register 
SI to that of logical register S2 and store the sum in logical register SI . Fetch/decode unit 202 
decodes this instruction into the conditional micro-operation ADD Pdest4 < (Pdestl, 
Pdest2), Pdest3:flag, where Pdest4 is a physical destination register to store the result of the 
ADD instruction, Pdestl is a physical destination register corresponding to logical register SI, 

10 Pdest2 is a physical destination register corresponding to logical register S2, and Pdest3 is a 
physical destination register corresponding to a flag register to store a conditional flag upon 
which the qualifying conditional branch instruction depends. 

If the ADD S1,S2 instruction is to be executed and retired, execution and retirement of 
the conditional micro-operation adds the data at Pdestl to that at Pdest2 and stores the sum at 

15 Pdest4. If the ADD S1,S2 instruction is not be executed, execution and retirement of the 
conditional micro-operation moves or copies the data at Pdestl to Pdest4. Because logical 
register SI corresponds to both Pdestl and Pdest4, moving or copying the data at Pdestl to 
Pdest4 effectively does not change logical register SI for the ADD S1,S2 instruction and 
therefore preserves the machine state as if no micro-operation was executed and retired. 

20 For one embodiment where each of one or more instructions do not use a destination as a 

source operand, such as a load instruction for example, fetch/decode unit 202 may then add a 
destination as a source operand to the instruction in decoding the instruction into one or more 
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conditional micro-operations. Decoder 406, conditional branch processing unit 408, and/or RAT 
and allocate unit 410 of Figure 4, for example, may add the destination as a source operand. 

Conditional Execution with Destination Bypassing 
Dispatch/execute unit 206 for one embodiment for block 322 dispatches and executes a 
5 conditional micro-operation in accordance with a flow diagram 600 as illustrated in Figure 6. 

For block 602 of Figure 6, dispatch/execute unit 206 determines whether a condition 
associated with the conditional micro-operation has been resolved. Dispatch/execute unit 206 
may determine whether the condition has been resolved in any suitable manner. For one 
embodiment where a conditional micro-operation is tagged with a conditional flag as an 
10 additional source operand, the conditional flag is captured when written upon execution of a 
micro-operation modifying the conditional flag. Dispatch/execute unit 206 may then determine 
the condition has been resolved in scanning the status of the conditional micro-operation and 
determining this additional source operand has been validated. 

Dispatch/execute unit 206 dispatches the conditional micro-operation for execution for 
15 block 604 when the condition for the conditional micro-operation has been resolved as 

determined for block 602 and when the conditional micro-operation is otherwise ready to be 
executed. Dispatch/execute unit 206 for one embodiment dispatches the conditional micro- 
operation to an appropriate execution unit based on the operation designated by the conditional 
micro-operation. 

20 Dispatch/execute unit 206 executes the conditional micro-operation by performing the 

operation designated by the conditional micro-operation to produce a first resuU for block 606 
and by performing an execution bypass operation to produce a second result for block 608. 
Dispatch/execute unit 206 for one embodiment for block 606 performs the operation designated 
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by the conditional micro-operation on one or more source operands for the conditional micro- 
operation to produce the first result. Dispatch/execute imit 206 for one embodiment for block 
608 performs an execution bypass operation by presenting a source operand for the conditional 
micro-operation as the second result. 

Dispatch/execute unit 206 for block 610 outputs either the first result or the second result 
as the result of executing the conditional micro-operation based on how the condition was 
resolved. Dispatch/execute unit 206 outputs the first result if the condition was resolved such 
that the instruction from which the conditional micro-operation was decoded is to be executed. 
Dispatch/execute unit 206 outputs the second result if the condition was resolved such that the 
instruction from which the conditional micro-operation was decoded is not to be executed. In 
retiring the conditional micro-operation, retire luiit 208 for one embodiment stores the output 
result in a location designated by a destination operand for the conditional micro-operation. 

As an illustration as to how dispatch/execute unit 206 executes a conditional micro- 
operation in accordance with flow diagram 600 of Figure 6, an exemplary instruction ADD 
S1,S2 designates to add the content of logical register SI to that of logical register S2 and store 
the result in logical register SI and is decoded and renamed into the conditional micro-operation 
ADD Pdest4 <= (Pdestl, Pdest2), Pdest3:flag. When the condition identified by Pdest3:flag 
is resolved and the source operands identified by Pdestl and Pdest2 are available, 
dispatch/execute unit 206 dispatches the conditional micro-operation for execution. 

Li executing the conditional micro-operation, dispatch/execute unit 206 adds the source 
operands identified by Pdestl and Pdest2 to produce a first result and presents the source 
operand identified by Pdestl as a second result. If the condition identified by Pdest3:flag was 
resolved such that the instruction ADD S1,S2 is to be executed, dispatch/execute unit 206 
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outputs the first result as the result of executing the conditional micro-operation. If the condition 
identified by Pdest3:flag was resolved such that the instruction ADD S1,S2 is not to be executed, 
dispatch/execute xinit 206 outputs the second result as the result of executing the conditional 
micro-operation. In retiring the conditional micro-operation, retire unit 208 stores the output 
result in a location designated by the destination operand identified by Pdest4. 

If the condition identified by Pdest3:flag is resolved such that the instruction ADD S1,S2 
is not to be executed, dispatch/execute unit 206 for one embodiment may dispatch the 
conditional micro-operation for execution when the source operand identified by Pdestl is 
available regardless of whether the source operand identified by Pdest2 is available. 

Dispatch/execute unit 206 may execute conditional micro-operations in accordance with 
flow diagram 600 of Figure 6 using any suitable circuitry. Dispatch/execute unit 206 for one 
embodiment dispatches a conditional micro-operation to an execution unit comprising circuitry 
to perform the operation designated by the conditional micro-operation to produce a first result 
and circuitry to perform the execution bypass operation to produce a second result. An execution 
unit also comprises circuitry to output the first resuh or the second result based on how the 
condition associated with the conditional micro-operation is resolved. 

Figure 7 illustrates, for one embodiment, conditional execution circuitry 700 with 
destination bypassing for an execution imit. Conditional execution circuitry 700 may be 
implemented in any suitable one or more execution units, such as an integer execution unit 
and/or a floating point execution unit for example. 

Conditional execution circuitry 700 comprises a register 71 1 to store a first source 
operand in a conditional micro-operation and a register 712 to store a second source operand, if 
any, in the conditional micro-operation. The output of each register 711 and 712 is coupled to a 
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respective input of an operation unit 721 for performing the operation designated by the 
conditional micro-operation. Operation unit 721 may comprise any suitable circuitry to perform 
any suitable operation designated by a micro-operation. Operation unit 721 may comprise, for 
example, an integer arithmetic logic unit (ALU). The output of operation unit 721 is coupled to 
5 an input of a multiplexer 730 and presents a first result to multiplexer 730. The output of register 
71 1 is also coupled to an input of multiplexer 730 by an execution bypass bus 722 and presents a 
second result to multiplexer 730. An execution unit comprising conditional execution circuitry 
700, for one embodiment, generates a control input 731 based on a condition associated with the 
conditional micro-operation. Multiplexer 730 selectively outputs either the first result or the 
1 0 second result to a result register 740 based on control input 73 1 . 

For one embodiment, conditional execution circuitry 700 may also be used in normally 
executing micro-operations designating a suitable operation that may be performed by operation 
unit 721 . Each such micro-operation may be executed by an execution unit comprising 
conditional execution circuitry 700 such that the control input to multiplexer 730 controls 
1 5 multiplexer 730 to output the result from operation unit 72 1 . 

An execution unit comprising conditional execution circuitry 700, for one embodiment, 
outputs the first and/or second resuhs from result register 740 for storage in reorder buffer 204. 

Dual Execution with Destination Bypassing 
Dispatch/execute unit 206 for another embodiment for block 322 dispatches and executes 
20 a conditional micro-operation in accordance with a flow diagram 800 as illustrated in Figure 8. 

For block 802 of Figure 8, dispatch/execute unit 206 dispatches the conditional micro- 
operation for execution when the conditional micro-operation is ready to be executed regardless 
of whether a condition associated with the conditional micro-operation has been resolved. 
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Dispatch/execute unit 206 for one embodiment dispatches the conditional micro-operation to an 
appropriate execution unit based on the operation designated by the conditional micro-operation. 

Dispatch/execute unit 206 executes the conditional micro-operation by performing the 
operation designated by the conditional micro-operation to produce a first result for block 804 
5 and by performing an execution bypass operation to produce a second result for block 806. 

Dispatch/execute unit 206 for one embodiment for block 804 performs the operation designated 
by the conditional micro-operation on one or more source operands for the conditional micro- 
operation to produce the first result. Dispatch/execute unit 206 for one embodiment for block 

j-=^. 806 performs an execution bypass operation by presenting a source operand for the conditional 

rri 10 micro-operation as the second result. 

1-4= Dispatch/execute unit 206 for block 808 associates the condition associated with the 

ru 

^ti conditional micro-operation with both the first resuh and the second result and for block 810 

ru 

■ ^ outputs both the first result and the second result with the associated condition. Dispatch/execute 
unit 206 for one embodiment stores the first resuU, the second result, and a conditional flag 

g 15 identifying the associated condition in reorder buffer 204. The conditional flag is captured when 
written upon execution of a micro-operation modifying the conditional flag. In retiring the 
conditional micro-operation, retire unit 208 for one embodiment may then scan the status of the 
conditional micro-operation, determine whether the associated condition has been resolved by 
determining whether the conditional flag has been validated, and commit either the first result or 
20 the second result to machine state based on how the condition was resolved. 

Retire unit 208 commits the first result to machine state if the associated condition is 
resolved such that the instruction firom which the conditional micro-operation was decoded is to 
be executed. Retire unit 208 commits the second result to machine state if the associated 
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condition is resolved such that the instruction from which the conditional micro-operation was 
decoded is not to be executed. 

By executing the conditional micro-operation in accordance with flow diagram 800 of 
Figure 8, dispatch/execute unit 206 better tolerates delays in resolving a condition associated 
with the conditional micro-operation and therefore helps avoid or minimize any stalling in 
executing micro-operations. 

If the condition associated with the conditional micro-operation has ahready been 
resolved when dispatch/execute unit 206 dispatches the conditional micro-operation, 
dispatch/execute unit 206 for one embodiment executes the conditional micro-operation to 
output only the appropriate first or second result in association with the resolved condition. 
Dispatch/execute unit 206 for another embodiment executes the conditional micro-operation to 
output both the first and second results in association with the resolved condition. 

Dispatch/execute unit 206 for one embodiment may continue executing any micro- 
operations that depend on an already speculatively executed conditional micro-operation 
regardless of whether a condition associated with the conditional micro-operation has been 
resolved. For one embodiment, any source operands depending on the execution of the 
speculatively executed conditional micro-operation are validated when the first and/or second 
results are output in association with the condition for the speculatively executed conditional 
micro-operation. 

Because conditional micro-operations in the same branch instruction path are associated 
with the same condition, dispatch/execute unit 206 for one embodiment may execute a 
conditional micro-operation that depends on another conditional micro-operation in the same 
branch instruction path by executing the dependent conditional micro-operation in accordance 
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with flow diagram 800 of Figure 8, for example, using the first result produced from executing 
the other conditional micro-operation. Dispatch/execute unit 206 for one embodiment identifies 
a dependent conditional micro-operation in the same branch instruction path based on the 
condition associated with the dependent conditional micro-operation and the condition associated 
with the first and/or second results for the conditional micro-operation upon which the dependent 
conditional micro-operation depends. 

Dispatch/execute unit 206 may also execute micro-operations that depend on a 
conditional micro-operation from outside the branch instruction path for the conditional micro- 
operation. Dispatch/execute unit 206 for one embodiment for block 322 dispatches and executes 
such a dependent micro-operation in accordance with a flow diagram 900 as illustrated in Figure 
9. 

For block 902 of Figure 9, dispatch/execute unit 206 dispatches for execution a micro- 
operation that depends on a conditional micro-operation. For one embodiment, the dependent 
micro-operation may depend on the conditional micro-operation firom outside the branch 
instruction path for the conditional micro-operation. Dispatch/execute unit 206 dispatches the 
dependent micro-operation when it is ready to be executed regardless of whether a condition 
associated with the conditional micro-operation has been resolved. Dispatch/execute unit 206 
for one embodiment identifies the dependent micro-operation as having a source operand based 
on the first and/or second results and the associated condition from the conditional micro- 
operation. Dispatch/execute unit 206 for one embodiment dispatches the dependent micro- 
operation to an appropriate execution unit based on the operation designated by the dependent 
micro-operation. 
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Dispatch/execute unit 206 executes the dependent micro-operation by performing the 
operation designated by the dependent micro-operation using the first result of the conditional 
micro-operation to produce a third result for block 904 and by performing the operation 
designated by the dependent micro-operation using the second result of the conditional micro- 
operation to produce a fourth result for block 906. Dispatch/execute unit 206 for one 
embodiment performs the operation designated by the dependent micro-operation on one or more 
source operands, including the first result for block 904 and the second result for block 906, for 
the dependent micro-operation to produce the third result and the fourth result, respectively. 

Dispatch/execute unit 206 for block 908 associates the condition associated with the 
conditional micro-operation with both the third resuh and the fourth result and for block 910 
outputs both the third resuh and the fourth result with the associated condition. Dispatch/execute 
unit 206 for one embodiment stores the third resuU, the fourth result, and a conditional flag 
identifying the associated condition in reorder buffer 204. The conditional flag is captured when 
written upon execution of a micro-operation modifying the conditional flag, hi retiring the 
dependent micro-operation, retire unit 208 for one embodiment may then scan the status of the 
dependent micro-operation, determine whether the associated condition has been resolved by 
determining whether the conditional flag has been validated, and commit either the third result or 
the fourth result to machine state based on how the condition was resolved. 

Retire unit 208 commits the third result to machine state if the associated condition was 
resolved such that the instruction firom which the conditional micro-operation was decoded is to 
be executed. Retire unit 208 commits the fourth result to machine state if the associated 
condition was resolved such that the instruction fi:-om which the conditional micro-operation was 
decoded is not to be executed. 
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If the condition associated with the conditional micro-operation has already been 
resolved when dispatch/execute unit 206 dispatches the dependent micro-operation, 
dispatch/execute unit 206 for one embodiment executes the dependent micro-operation using the 
appropriate first or second result to output only the appropriate third or fourth result, 
respectively, in association with the resolved condition. Dispatch/execute unit 206 for another 
embodiment executes the dependent micro-operation to output both the third and fourth results in 
association with the resolved condition. 

Dispatch/execute unit 206 for one embodiment may continue similarly executing micro- 
operations that depend on micro-operations dependent on conditional micro-operations 
regardless of whether the condition associated with the dependent micro-operation has been 
resolved. 

As an illustration as to how dispatch/execute unit 206 executes micro-operations in 
accordance with flow diagrams 800 and 900, an exemplary program fi-agment contains the 
following instructions: 

JC (Targetl) 
ADDS1,S2 
Targetl: MULS3,S1 

where JC (Targetl) designates to jump or branch to the instruction at Targetl if condition C is 

satisfied or to continue with the next sequential instruction if condition C is not satisfied, ADD 

S1,S2 designates to add the content of logical register SI to that of logical register S2 and store 

the sum in logical register SI, and MUL S3, SI designates to multiply the content of logical 

register S3 by the content of logical register SI and store the product in logical register S3. For 

this illustration, JC (Targetl) is identified as a qualifying conditional branch instruction, and 
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fetch/decode unit 202 decodes the instruction ADD S1,S2 into a single conditional micro- 
operation and decodes the instruction MUL S3, SI into a single micro-operation. 

Before the condition C is resolved, dispatch/execute imit 206 may dispatch and execute 
the conditional micro-operation for the instruction ADD SI5S2 in accordance with flow diagram 
800 of Figure 8 to produce the sum of the content of the logical registers SI and S2 as a first 
result, to produce the content of the logical register SI as a second result, and to associate the 
condition C with both the first and second results. 

Although the instruction MUL S3,S1 is outside the fall through branch instruction path 
for the qualifying conditional branch instruction JC (Targetl) and depends on the instruction 
ADD S1,S2, dispatch/execute imit 206 may also dispatch and execute, before the condition C is 
resolved, the micro-operation for the instruction MUL S3,S1 in accordance with flow diagram 
900 of Figure 9 to produce the product of the content of the logical register S3 and the first result 
as a third result, to produce the product of the content of the logical register S3 and the second 
result as a fourth result, and to associate the condition C with both the third and fourth results. 

Retire unit 208 commits the first and third resuhs to machine state if the condition C is 
resolved such that the instruction ADD S1,S2 is to follow the quaUfying conditional branch 
instruction JC (Targetl) and commits the second and fourth results to machine state if the 
condition C is resolved such that the instruction MUL S3,S1 is to follow the qualifying 
conditional branch instruction JC (Targetl). 

Dispatch/execute unit 206 may execute micro-operations in accordance with flow 
diagrams 800 and 900 using any suitable circuitry. Dispatch/execute unit 206 for one 
embodiment dispatches a conditional micro-operation to an execution unit comprising circuitry 
to perform the operation designated by the conditional micro-operation to produce a first result 

-32- 

Attomey's Docket No. 042390.P9147 



and circuitry to perform the execution bypass operation to produce a second result. The 
execution unit also comprises circuitry to output the first and/or second results. Dispatch/execute 
unit 206 may comprise any suitable circuitry to associate the condition for the conditional micro- 
operation with the first and/or second results. 

Dispatch/execute unit 206 for one embodiment dispatches a micro-operation that depends 
on a conditional micro-operation to an execution unit comprising circuitry to perform the 
operation designated by the dependent micro-operation to produce a third and/or fourth result. 
The execution unit also comprises circuitry to output the third and/or fourth results. 
Dispatch/execute unit 206 may comprise any suitable circuitry to associate the condition for the 
conditional micro-operation with the third and/or fourth results. 

Dispatch/execute unit 206 for one embodiment comprises suitable circuitry to execute a 
scalar micro-operation having an associated condition as a single instruction multiple data 
(SIMD) micro-operation to execute the scalar micro-operation for each possible condition. 

Figure 10 illustrates, for one embodiment, dual execution circuitry 1000 with destination 
bypassing for an execution unit. Dual execution circuitry 1000 may be implemented in any 
suitable one or more execution units, such as an integer execution imit and/or a floating point 
execution unit for example. Dual execution circuitry 1000 for one embodiment is implemented 
in one or more suitable SMD execution units. 

Dual execution circuitry 1000 comprises a register 101 1 to store a source operand in a 
dispatched micro-operation and a register 1012 to store a source operand in the dispatched 
micro-operation. The output of each register 1011 and 1012 is coupled to a respective input of 
an operation unit 1021 for performing the operation designated by the dispatched micro- 
operation. Operation unit 1021 may comprise any suitable circuitry to perform any suitable 
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operation designated by a micro-operation. Operation unit 1021 may comprise, for example, an 
integer arithmetic logic imit (ALU). The output of operation unit 1021 is coupled to an input of 
a multiplexer 1 03 1 and presents a result to multiplexer 1 03 1 . The output of register 1 0 1 1 is also 
coupled to an input of a multiplexer 1033 by an execution bypass bus 1022 and presents a result 
5 to multiplexer 1033. 

Dual execution circuitry 1000 also comprises a register 1013 to store a source operand in 
a dispatched micro-operation and a register 1014 to store a source operand in the dispatched 
micro-operation. The output of each register 1013 and 1014 is coupled to a respective input of 
l=i an operation unit 1023 for performing the operation designated by the dispatched micro- 
III 10 operation. Operation unit 1023 may comprise any suitable circuitry to perform any suitable 
l# operation designated by a micro-operation. Operation unit 1023 may comprise, for example, an 

fy 

15 integer arithmetic logic unit (ALU). The output of operation unit 1023 is coupled to an input of 
multiplexer 1033 and presents a resuh to multiplexer 1033. The output of register 1013 is also 

™ coupled to an input of multiplexer 1031 by an execution bypass bus 1024 and presents a result to 

CP 

f 3 1 5 multiplexer 1031. 

i 

Multiplexer 103 1 selectively outputs either the result from operation unit 1021 or the 
result from destination bypass bus 1024 to a field 1041 of a result register 1040 based on a first 
dual execution control signal 1032. Multiplexer 1033 selectively outputs either the result from 
operation imit 1023 or the result from destination bypass bus 1022 to a field 1042 of resuh 
20 register 1040 based on a dual execution control signal 1034. An execution unit comprising dual 
execution circuitry 1000, for one embodiment, generates dual execution control signals 1032 and 
1034 in accordance with the dispatched micro-operation. An execution unit comprising dual 
execution circuitry 1000, for one embodiment, derives an appropriate condition from the 
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dispatched micro-operation and stores suitable information identifying the appropriate condition 
in a field 1043 of result register 1040. 

To execute a conditional micro-operation for one embodiment, operation unit 1023 
performs the operation designated by the conditional micro-operation using a source operand 
stored in register 1013 and any source operand stored in register 1014 and outputs a first result to 
multiplexer 1033. The source operand stored in register 1013 is also presented to multiplexer 
1031 by destination bypass bus 1024 as a second result. Multiplexers 1031 and 1033 may then 
be controlled through first and second dual execution control signals 1032 and 1034, 
respectively, to output the second and first results, respectively, to fields 1041 and 1042, 
respectively, of result register 1040, An appropriate condition is derived, for example, fi*om the 
condition associated v^ith the conditional micro-operation and stored in field 1043 of result 
register 1040. 

To execute a conditional micro-operation for another embodiment, operation unit 1021 
performs the operation designated by the conditional micro-operation using a source operand 
stored in register 1011 and any source operand stored in register 1012 and outputs a first result to 
multiplexer 1031. The source operand stored in register 101 1 is also presented to multiplexer 
1033 by destination bypass bus 1022 as a second result. Multiplexers 1031 and 1033 may then 
be controlled through first and second dual execution control signals 1032 and 1034, 
respectively, to output the first and second results, respectively, to fields 1041 and 1042, 
respectively, of result register 1040. An appropriate condition is derived, for example, fi-om the 
condition associated with the conditional micro-operation and stored in field 1043 of result 
register 1040. 
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If the condition for the conditional micro-operation has akeady been resolved, an 
execution unit comprising dual execution circuitry 1000 may alternatively output only one result 
based on the resolved condition to field 1041 or 1042. 

To execute, for one embodiment, a micro-operation that depends on a conditional micro- 
operation, operation unit 1021 performs the operation designated by the dependent micro- 
operation using the first result fi'om the conditional micro-operation and any other source 
operand in the dependent micro-operation and outputs a third result to multiplexer 103 1 . The 
first result may be stored in either register 101 1 or register 1012, and any other source operand 
may be stored in the other register 101 1 or 1012. Operation unit 1023 performs the operation 
designated by the dependent micro-operation using the second result firom the conditional micro- 
operation and any other source operand in the dependent micro-operation and outputs a fourth 
resuh to muhiplexer 1033. The second result may be stored in either register 1013 or register 
1014, and any other source operand maybe stored in the other register 1013 or 1014. 
Multiplexers 1031 and 1033 may then be controlled through first and second dual execution 
control signals 1032 and 1034, respectively, to output the third and fourth results, respectively, to 
fields 1041 and 1042, respectively, of result register 1040. An appropriate condition is derived 
fi'om the condition associated with the first and/or second results of the conditional micro 
operation and stored in field 1043 of result register 1040. 

To execute, for another embodiment, a micro-operation that depends on a conditional 
micro-operation, operation imit 1021 performs the operation designated by the dependent micro- 
operation using the second result from the conditional micro-operation and any other source 
operand in the dependent micro-operation and outputs a fourth result to multiplexer 1031. The 
second result may be stored in either register 101 1 or register 1012, and any other source 
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operand may be stored in the other register 101 1 or 1012. Operation unit 1023 performs the 
operation designated by the dependent micro-operation using the first result from the conditional 
micro-operation and any other source operand in the dependent micro-operation and outputs a 
third resuh to multiplexer 1033. The first result may be stored in either register 1013 or register 
5 1014, and any other source operand may be stored in the other register 1013 or 1014, 

Multiplexers 1031 and 1033 may then be controlled through first and second dual execution 
control signals 1032 and 1034, respectively, to output the fourth and third results, respectively, to 
fields 1041 and 1042, respectively, of result register 1040. An appropriate condition is derived 
from the condition associated with the first and/or second results of the conditional micro- 

m 10 operation and stored in field 1043 of result register 1040. 

P 

1=^ An execution imit comprising dual execution circuitry 1000 may execute a micro- 

ru 

operation dependent on another micro-operation that depends on a conditional micro-operation 

ru 

^1, similarly as a micro-operation that depends on a conditional micro-operation, only using the third 
1^ and fourth results of the other dependent micro-operation. An appropriate condition is derived 

m 

13 15 from the condition associated with the third and/or fourth results of the other dependent micro- 

n 

operation and stored in field 1043 of result register 1040. 

If the condition for the dependent micro-operation has aheady been resolved, an 
execution unit comprising dual execution circuitry 1000 may alternatively execute the dependent 
micro-operation to output only one result based on the resolved condition to field 1041 or 1042. 
20 An execution unit comprising dual execution circuitry 1000 may use the appropriate first or 

second result from a conditional micro-operation based on the resolved condition in executing a 
micro-operation dependent on the conditional micro-operation. An execution imit comprising 
dual execution circuitry 1000 may use the appropriate third or fourth result from a micro- 
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operation dependent on a conditional micro-operation based on the resolved condition in 
executing a micro-operation dependent on that dependent micro-operation. 

For one embodiment, dual execution circuitry 1000 may also be used in normally 
executing micro-operations designating a suitable operation that may be performed by operation 
5 unit 1021 and/or operation unit 1023. 

An execution unit comprising dual execution circuitry 1000, for one embodiment, outputs 
the third and/or fourth results with the associated condition from result register 1040 for storage 
in reorder buffer 204, 

Optimization 

in 10 Instruction pipeline 200 for one embodiment identifies qualifying conditional branch 

1=^ instructions as those identifying a target instruction within a suitable predetermined number of 
ffl instructions from the conditional branch instruction, hi this manner, instruction pipeline 200 
L helps reduce any additional overhead associated with processing instructions in a fall through 
fl^ branch instruction path that is not taken. Instruction pipeline 200 for one embodiment may also 
□ 15 identify qualifying conditional branch instructions as those having a target branch instruction 

s 

path having less than or equal to a suitable predetermined number of instructions. 

Limiting the number of instructions in either or both branch instruction paths for a 
qualifying conditional branch instruction also helps minimize the number of control instructions, 
such as backward branches for example, that may be in a non-taken branch instruction path. 
20 Such control instructions may unnecessarily add to overhead by fetching additional unnecessary 
instructions for a branch instruction path that is not taken. 

Limiting the number of instructions in either or both branch instruction paths for a 
qualifying conditional branch instruction also helps minimize the number of instructions that are 
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outside the branch instruction path and that have source dependency on instructions in the branch 
instruction path. Such dependent instructions may be delayed from execution until resolution of 
a condition for the qualifying conditional branch instruction. 

Instruction pipeUne 200 for one embodiment identifies qualifying conditional branch 
5 instructions as those not being predictable within a suitable predetermined degree of accuracy to 
help reduce any overhead associated with processing instructions in a branch instruction path 
that is not likely to be taken. Unlike using software predication where compiler decisions are 
static, instruction pipeline 200 may then dynamically determine whether to perform predication 
p or accept a branch prediction upon fetching any conditional branch instruction. 
In 10 The predictabiUty of a conditional branch instruction may be determined in any suitable 

1^ manner. For one embodiment, instruction pipeline 200 determines how often a condition for the 

ry 

|y conditional branch instruction is resolved in the same manner each time the conditional branch 

m 

instruction is executed. If the condition is resolved in the same manner for greater than or equal 
p to a suitable predetermined percentage of the number of times the conditional branch instruction 

m 

fj 15 has akeady been executed, the conditional branch instruction is considered predictable within a 

5 

suitable predetermined degree of accuracy. 

If a control instruction occurred in a branch instruction path for a quaUfying conditional 
branch instruction, instruction pipeline 200 for one embodiment fetches the control instruction 
yet delays processing the control instruction any further until a condition for the qualifying 
20 conditional branch instruction is resolved. In this manner, instruction pipeline 200 helps reduce 
any additional overhead in processing instructions in additional instruction paths. 

If instruction pipeline 200 identifies a qualifying conditional branch instruction in a 
branch instruction path for another quahfying conditional branch instruction, instruction pipeline 
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200 for one embodiment may predict fall through execution for the nested qualifying conditional 
branch instruction and associate two conditions with each micro-operation decoded from an 
instruction in each nested branch instruction path. That is, the condition associated with each 
qualifying conditional branch instruction is associated with each such micro-operation. 
Dispatch/execute imit 206 may then dispatch and execute each such micro-operation in a suitable 
manner, accoxmting for each condition associated with the micro-operation. 

In the foregoing description, the invention has been described with reference to specific 
exemplary embodiments thereof It will, however, be evident that various modifications and 
changes may be made thereto without departing from the broader spirit or scope of the present 
invention as defined in the appended claims. The specification and drawings are, accordingly, to 
be regarded in an illustrative rather than a restrictive sense. 

What is claimed is: 
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